Building a song recommender

Fire up GraphLab Create

In [1]:
import graphlab

Load music data

In [2]:
song_data = graphlab.SFrame('song_data.gl/')
This non-commercial license of GraphLab Create for academic use is assigned to agrawal.pr@husky.neu.edu and will expire on March 12, 2018.
[INFO] graphlab.cython.cy_server: GraphLab Create v2.1 started. Logging: C:\Users\agraw\AppData\Local\Temp\graphlab_server_1492045349.log.0

Explore data

Music data shows how many times a user listened to a song, as well as the details of the song.

In [3]:
song_data.head()
Out[3]:
user_id song_id listen_count title artist
b80344d063b5ccb3212f76538
f3d9e43d87dca9e ...
SOAKIMP12A8C130995 1 The Cove Jack Johnson
b80344d063b5ccb3212f76538
f3d9e43d87dca9e ...
SOBBMDR12A8C13253B 2 Entre Dos Aguas Paco De Lucia
b80344d063b5ccb3212f76538
f3d9e43d87dca9e ...
SOBXHDL12A81C204C0 1 Stronger Kanye West
b80344d063b5ccb3212f76538
f3d9e43d87dca9e ...
SOBYHAJ12A6701BF1D 1 Constellations Jack Johnson
b80344d063b5ccb3212f76538
f3d9e43d87dca9e ...
SODACBL12A8C13C273 1 Learn To Fly Foo Fighters
b80344d063b5ccb3212f76538
f3d9e43d87dca9e ...
SODDNQT12A6D4F5F7E 5 Apuesta Por El Rock 'N'
Roll ...
Héroes del Silencio
b80344d063b5ccb3212f76538
f3d9e43d87dca9e ...
SODXRTY12AB0180F3B 1 Paper Gangsta Lady GaGa
b80344d063b5ccb3212f76538
f3d9e43d87dca9e ...
SOFGUAY12AB017B0A8 1 Stacked Actors Foo Fighters
b80344d063b5ccb3212f76538
f3d9e43d87dca9e ...
SOFRQTD12A81C233C0 1 Sehr kosmisch Harmonia
b80344d063b5ccb3212f76538
f3d9e43d87dca9e ...
SOHQWYZ12A6D4FA701 1 Heaven's gonna burn your
eyes ...
Thievery Corporation
feat. Emiliana Torrini ...
song
The Cove - Jack Johnson
Entre Dos Aguas - Paco De
Lucia ...
Stronger - Kanye West
Constellations - Jack
Johnson ...
Learn To Fly - Foo
Fighters ...
Apuesta Por El Rock 'N'
Roll - Héroes del ...
Paper Gangsta - Lady GaGa
Stacked Actors - Foo
Fighters ...
Sehr kosmisch - Harmonia
Heaven's gonna burn your
eyes - Thievery ...
[10 rows x 6 columns]
In [4]:
graphlab.canvas.set_target('ipynb')
In [5]:
song_data['song'].show()
In [6]:
len(song_data)
Out[6]:
1116609

Unique users in the dataset

In [7]:
users = song_data['user_id'].unique()
In [8]:
len(users)
Out[8]:
66346

Build a song recommender

In [9]:
train_data,test_data = song_data.random_split(.8,seed=0)

Simple popularity-based recommender

In [10]:
popularity_model = graphlab.popularity_recommender.create(train_data,
                                                         user_id='user_id',
                                                         item_id='song')
Recsys training: model = popularity
Warning: Ignoring columns song_id, listen_count, title, artist;
    To use one of these as a target column, set target = 
    and use a method that allows the use of a target.
Preparing data set.
    Data has 893580 observations with 66085 users and 9952 items.
    Data prepared in: 0.766688s
893580 observations to process; with 9952 unique items.

Apply the popularity model to make some predictions

A popularity model makes the same prediction for all users, so provides no personalization.

In [11]:
popularity_model.recommend(users=[users[0]])
Out[11]:
user_id song score rank
c66c10a9567f0d82ff31441a9
fd5063e5cd9dfe8 ...
Sehr kosmisch - Harmonia 4754.0 1
c66c10a9567f0d82ff31441a9
fd5063e5cd9dfe8 ...
Undo - Björk 4227.0 2
c66c10a9567f0d82ff31441a9
fd5063e5cd9dfe8 ...
You're The One - Dwight
Yoakam ...
3781.0 3
c66c10a9567f0d82ff31441a9
fd5063e5cd9dfe8 ...
Dog Days Are Over (Radio
Edit) - Florence + The ...
3633.0 4
c66c10a9567f0d82ff31441a9
fd5063e5cd9dfe8 ...
Revelry - Kings Of Leon 3527.0 5
c66c10a9567f0d82ff31441a9
fd5063e5cd9dfe8 ...
Horn Concerto No. 4 in E
flat K495: II. Romance ...
3161.0 6
c66c10a9567f0d82ff31441a9
fd5063e5cd9dfe8 ...
Secrets - OneRepublic 3148.0 7
c66c10a9567f0d82ff31441a9
fd5063e5cd9dfe8 ...
Fireflies - Charttraxx
Karaoke ...
2532.0 8
c66c10a9567f0d82ff31441a9
fd5063e5cd9dfe8 ...
Tive Sim - Cartola 2521.0 9
c66c10a9567f0d82ff31441a9
fd5063e5cd9dfe8 ...
Drop The World - Lil
Wayne / Eminem ...
2053.0 10
[10 rows x 4 columns]
In [12]:
popularity_model.recommend(users=[users[1]])
Out[12]:
user_id song score rank
279292bb36dbfc7f505e36ebf
038c81eb1d1d63e ...
Sehr kosmisch - Harmonia 4754.0 1
279292bb36dbfc7f505e36ebf
038c81eb1d1d63e ...
Undo - Björk 4227.0 2
279292bb36dbfc7f505e36ebf
038c81eb1d1d63e ...
You're The One - Dwight
Yoakam ...
3781.0 3
279292bb36dbfc7f505e36ebf
038c81eb1d1d63e ...
Dog Days Are Over (Radio
Edit) - Florence + The ...
3633.0 4
279292bb36dbfc7f505e36ebf
038c81eb1d1d63e ...
Revelry - Kings Of Leon 3527.0 5
279292bb36dbfc7f505e36ebf
038c81eb1d1d63e ...
Horn Concerto No. 4 in E
flat K495: II. Romance ...
3161.0 6
279292bb36dbfc7f505e36ebf
038c81eb1d1d63e ...
Secrets - OneRepublic 3148.0 7
279292bb36dbfc7f505e36ebf
038c81eb1d1d63e ...
Hey_ Soul Sister - Train 2538.0 8
279292bb36dbfc7f505e36ebf
038c81eb1d1d63e ...
Fireflies - Charttraxx
Karaoke ...
2532.0 9
279292bb36dbfc7f505e36ebf
038c81eb1d1d63e ...
Tive Sim - Cartola 2521.0 10
[10 rows x 4 columns]

Build a song recommender with personalization

We now create a model that allows us to make personalized recommendations to each user.

In [13]:
personalized_model = graphlab.item_similarity_recommender.create(train_data,
                                                                user_id='user_id',
                                                                item_id='song')
Recsys training: model = item_similarity
Warning: Ignoring columns song_id, listen_count, title, artist;
    To use one of these as a target column, set target = 
    and use a method that allows the use of a target.
Preparing data set.
    Data has 893580 observations with 66085 users and 9952 items.
    Data prepared in: 0.783632s
Training model from provided data.
Gathering per-item and per-user statistics.
+--------------------------------+------------+
| Elapsed Time (Item Statistics) | % Complete |
+--------------------------------+------------+
| 5.962ms                        | 3          |
| 61.5ms                         | 100        |
+--------------------------------+------------+
Setting up lookup tables.
Processing data in one pass using dense lookup tables.
+-------------------------------------+------------------+-----------------+
| Elapsed Time (Constructing Lookups) | Total % Complete | Items Processed |
+-------------------------------------+------------------+-----------------+
| 192.573ms                           | 0                | 0               |
| 1.29s                               | 100              | 9952            |
+-------------------------------------+------------------+-----------------+
Finalizing lookup tables.
Generating candidate set for working with new users.
Finished training in 2.35845s

Applying the personalized model to make song recommendations

As you can see, different users get different recommendations now.

In [14]:
personalized_model.recommend(users=[users[0]])
Out[14]:
user_id song score rank
c66c10a9567f0d82ff31441a9
fd5063e5cd9dfe8 ...
Cuando Pase El Temblor -
Soda Stereo ...
0.0194504536115 1
c66c10a9567f0d82ff31441a9
fd5063e5cd9dfe8 ...
Fireflies - Charttraxx
Karaoke ...
0.0144737317012 2
c66c10a9567f0d82ff31441a9
fd5063e5cd9dfe8 ...
Love Is A Losing Game -
Amy Winehouse ...
0.0142865960415 3
c66c10a9567f0d82ff31441a9
fd5063e5cd9dfe8 ...
Marry Me - Train 0.014133471709 4
c66c10a9567f0d82ff31441a9
fd5063e5cd9dfe8 ...
Secrets - OneRepublic 0.013591665488 5
c66c10a9567f0d82ff31441a9
fd5063e5cd9dfe8 ...
Sehr kosmisch - Harmonia 0.0133987894425 6
c66c10a9567f0d82ff31441a9
fd5063e5cd9dfe8 ...
Te Hacen Falta Vitaminas
- Soda Stereo ...
0.0129302831796 7
c66c10a9567f0d82ff31441a9
fd5063e5cd9dfe8 ...
OMG - Usher featuring
will.i.am ...
0.0127778282532 8
c66c10a9567f0d82ff31441a9
fd5063e5cd9dfe8 ...
Y solo se me ocurre
amarte (Unplugged) - ...
0.0123411279458 9
c66c10a9567f0d82ff31441a9
fd5063e5cd9dfe8 ...
No Dejes Que... -
Caifanes ...
0.0121042499175 10
[10 rows x 4 columns]
In [15]:
personalized_model.recommend(users=[users[1]])
Out[15]:
user_id song score rank
279292bb36dbfc7f505e36ebf
038c81eb1d1d63e ...
Riot In Cell Block Number
Nine - Dr Feelgood ...
0.0374999940395 1
279292bb36dbfc7f505e36ebf
038c81eb1d1d63e ...
Sei Lá Mangueira -
Elizeth Cardoso ...
0.0331632643938 2
279292bb36dbfc7f505e36ebf
038c81eb1d1d63e ...
The Stallion - Ween 0.0322580635548 3
279292bb36dbfc7f505e36ebf
038c81eb1d1d63e ...
Rain - Subhumans 0.0314159244299 4
279292bb36dbfc7f505e36ebf
038c81eb1d1d63e ...
West One (Shine On Me) -
The Ruts ...
0.0306771993637 5
279292bb36dbfc7f505e36ebf
038c81eb1d1d63e ...
Back Against The Wall -
Cage The Elephant ...
0.0301204770803 6
279292bb36dbfc7f505e36ebf
038c81eb1d1d63e ...
Life Less Frightening -
Rise Against ...
0.0284431129694 7
279292bb36dbfc7f505e36ebf
038c81eb1d1d63e ...
A Beggar On A Beach Of
Gold - Mike And The ...
0.0230024904013 8
279292bb36dbfc7f505e36ebf
038c81eb1d1d63e ...
Audience Of One - Rise
Against ...
0.0193938463926 9
279292bb36dbfc7f505e36ebf
038c81eb1d1d63e ...
Blame It On The Boogie -
The Jacksons ...
0.0189873427153 10
[10 rows x 4 columns]

Apply the model to find similar songs to any song in the dataset

In [16]:
personalized_model.get_similar_items(['With Or Without You - U2'])
Out[16]:
song similar score rank
With Or Without You - U2 I Still Haven't Found
What I'm Looking For ...
0.042857170105 1
With Or Without You - U2 Hold Me_ Thrill Me_ Kiss
Me_ Kill Me - U2 ...
0.0337349176407 2
With Or Without You - U2 Window In The Skies - U2 0.0328358411789 3
With Or Without You - U2 Vertigo - U2 0.0300751924515 4
With Or Without You - U2 Sunday Bloody Sunday - U2 0.0271317958832 5
With Or Without You - U2 Bad - U2 0.0251798629761 6
With Or Without You - U2 A Day Without Me - U2 0.0237154364586 7
With Or Without You - U2 Another Time Another
Place - U2 ...
0.0203251838684 8
With Or Without You - U2 Walk On - U2 0.0202020406723 9
With Or Without You - U2 Get On Your Boots - U2 0.0196850299835 10
[10 rows x 4 columns]
In [17]:
personalized_model.get_similar_items(['Chan Chan (Live) - Buena Vista Social Club'])
Out[17]:
song similar score rank
Chan Chan (Live) - Buena
Vista Social Club ...
Murmullo - Buena Vista
Social Club ...
0.188118815422 1
Chan Chan (Live) - Buena
Vista Social Club ...
La Bayamesa - Buena Vista
Social Club ...
0.18719214201 2
Chan Chan (Live) - Buena
Vista Social Club ...
Amor de Loca Juventud -
Buena Vista Social Club ...
0.184834122658 3
Chan Chan (Live) - Buena
Vista Social Club ...
Diferente - Gotan Project 0.0214592218399 4
Chan Chan (Live) - Buena
Vista Social Club ...
Mistica - Orishas 0.0205761194229 5
Chan Chan (Live) - Buena
Vista Social Club ...
Hotel California - Gipsy
Kings ...
0.0193049907684 6
Chan Chan (Live) - Buena
Vista Social Club ...
Nací Orishas - Orishas 0.0191571116447 7
Chan Chan (Live) - Buena
Vista Social Club ...
Gitana - Willie Colon 0.018796980381 8
Chan Chan (Live) - Buena
Vista Social Club ...
Le Moulin - Yann Tiersen 0.018796980381 9
Chan Chan (Live) - Buena
Vista Social Club ...
Criminal - Gotan Project 0.0187793374062 10
[10 rows x 4 columns]

Quantitative comparison between the models

We now formally compare the popularity and the personalized models using precision-recall curves.

In [20]:
model_performance = graphlab.compare(test_data, [popularity_model, personalized_model], user_sample=0.05)
graphlab.show_comparison(model_performance,[popularity_model, personalized_model])
compare_models: using 2931 users to estimate model performance
PROGRESS: Evaluate model M0
recommendations finished on 1000/2931 queries. users per second: 18133.7
recommendations finished on 2000/2931 queries. users per second: 17809.6
Precision and recall summary statistics by cutoff
+--------+-----------------+------------------+
| cutoff |  mean_precision |   mean_recall    |
+--------+-----------------+------------------+
|   1    | 0.0303650631184 | 0.00815547279212 |
|   2    | 0.0288297509382 | 0.0153446822331  |
|   3    | 0.0257022631639 | 0.0202926153796  |
|   4    | 0.0237973387922 | 0.0242455703817  |
|   5    | 0.0215626066189 | 0.0274316607991  |
|   6    | 0.0202433754123 | 0.0307677992427  |
|   7    |  0.018959886923 | 0.0336621261903  |
|   8    | 0.0179546229956 |  0.037322382307  |
|   9    | 0.0170590242238 | 0.0397263631399  |
|   10   | 0.0161378369157 | 0.0422611298251  |
+--------+-----------------+------------------+
[10 rows x 3 columns]

PROGRESS: Evaluate model M1
recommendations finished on 1000/2931 queries. users per second: 13477.5
recommendations finished on 2000/2931 queries. users per second: 14559.7
Precision and recall summary statistics by cutoff
+--------+-----------------+-----------------+
| cutoff |  mean_precision |   mean_recall   |
+--------+-----------------+-----------------+
|   1    |  0.199931763903 | 0.0613001073068 |
|   2    |  0.164448993518 | 0.0942752371089 |
|   3    |  0.142840896167 |  0.119453051258 |
|   4    |  0.128283862163 |  0.140886482492 |
|   5    |  0.115523712044 |  0.156535227772 |
|   6    |  0.106732628227 |  0.17330221552  |
|   7    | 0.0987473802213 |  0.186289999184 |
|   8    | 0.0921613783692 |  0.198437417752 |
|   9    | 0.0876833845104 |  0.211520002176 |
|   10   | 0.0827021494371 |  0.220908195058 |
+--------+-----------------+-----------------+
[10 rows x 3 columns]

Model compare metric: precision_recall

The curve shows that the personalized model provides much better performance.

Unique users who listened to Kanye West's songs

In [23]:
kanye_songs = song_data[song_data['artist'] == 'Kanye West']
In [24]:
kanye_songs
Out[24]:
user_id song_id listen_count title artist
b80344d063b5ccb3212f76538
f3d9e43d87dca9e ...
SOBXHDL12A81C204C0 1 Stronger Kanye West
b80344d063b5ccb3212f76538
f3d9e43d87dca9e ...
SOMLMKI12A81C204BC 1 Champion Kanye West
5d5e0142e54c3bb7b69f548c2
ee55066c90700eb ...
SORFASW12A81C22AE7 2 Stronger Kanye West
537340ff896dea11328910013
cfe759413e1eeb3 ...
SOBXHDL12A81C204C0 2 Stronger Kanye West
7dd192c8bd4f27f573cb15e86
56442aadd7a9c01 ...
SOOLPFK12A58A7BDE3 5 Flashing Lights Kanye West
8fce200f3912e9608e3b1463c
db9c3529aab5c08 ...
SOBXHDL12A81C204C0 2 Stronger Kanye West
8fce200f3912e9608e3b1463c
db9c3529aab5c08 ...
SOIBSWV12A6D4F6AB3 1 Through The Wire Kanye West
a56bf59af6edc5ae6c92d61dd
d214989332864e8 ...
SONGNHO12AB0183915 1 Bad News Kanye West
8fa25e588aeedaa539674babb
75729ac9f31f15e ...
SOOLPFK12A58A7BDE3 1 Flashing Lights Kanye West
e8612acfb1572297ea0eaaa1f
27927d55fdcec65 ...
SOIYWPZ12A81C204EF 2 Homecoming Kanye West
song
Stronger - Kanye West
Champion - Kanye West
Stronger - Kanye West
Stronger - Kanye West
Flashing Lights - Kanye
West ...
Stronger - Kanye West
Through The Wire - Kanye
West ...
Bad News - Kanye West
Flashing Lights - Kanye
West ...
Homecoming - Kanye West
[? rows x 6 columns]
Note: Only the head of the SFrame is printed. This SFrame is lazily evaluated.
You can use sf.materialize() to force materialization.
In [30]:
kanye_users = kanye_songs['user_id'].unique()
In [31]:
len(kanye_users)
Out[31]:
2522

Unique users who listened to Foo Fighters songs

In [32]:
foo_songs = song_data[song_data['artist'] == 'Foo Fighters']
In [33]:
foo_songs
Out[33]:
user_id song_id listen_count title artist
b80344d063b5ccb3212f76538
f3d9e43d87dca9e ...
SODACBL12A8C13C273 1 Learn To Fly Foo Fighters
b80344d063b5ccb3212f76538
f3d9e43d87dca9e ...
SOFGUAY12AB017B0A8 1 Stacked Actors Foo Fighters
b80344d063b5ccb3212f76538
f3d9e43d87dca9e ...
SOMSQJY12A8C138539 1 Breakout Foo Fighters
b80344d063b5ccb3212f76538
f3d9e43d87dca9e ...
SOVHRGF12A8C13852F 1 Generator Foo Fighters
12768858f6a825452e412deb1
df36d2d1d9c6791 ...
SODACBL12A8C13C273 4 Learn To Fly Foo Fighters
12768858f6a825452e412deb1
df36d2d1d9c6791 ...
SOQLUTQ12A8AE48037 2 The Pretender Foo Fighters
f47116f998e030f2dab275b81
fb2a04a9dc06c33 ...
SOLTAEJ12A8C13F793 6 What If I Do? Foo Fighters
5e161b9e14f303a0cef2d3f44
d07dd946549f89f ...
SOLJYEI12A8C13F7B3 2 Friend Of A Friend Foo Fighters
e4c05157f8cebdf3b9d689c44
1ba97c5ed5db05b ...
SOFGUAY12AB017B0A8 1 Stacked Actors Foo Fighters
e4c05157f8cebdf3b9d689c44
1ba97c5ed5db05b ...
SOFZVOT12A8C1408E9 1 Skin And Bones Foo Fighters
song
Learn To Fly - Foo
Fighters ...
Stacked Actors - Foo
Fighters ...
Breakout - Foo Fighters
Generator - Foo Fighters
Learn To Fly - Foo
Fighters ...
The Pretender - Foo
Fighters ...
What If I Do? - Foo
Fighters ...
Friend Of A Friend - Foo
Fighters ...
Stacked Actors - Foo
Fighters ...
Skin And Bones - Foo
Fighters ...
[? rows x 6 columns]
Note: Only the head of the SFrame is printed. This SFrame is lazily evaluated.
You can use sf.materialize() to force materialization.
In [34]:
foo_users = foo_songs['user_id'].unique()
In [35]:
len(foo_users)
Out[35]:
2055

Unique users who listened to Taylor Swift's songs

In [36]:
taylor_songs = song_data[song_data['artist'] == 'Taylor Swift']
In [37]:
taylor_songs
Out[37]:
user_id song_id listen_count title artist
169f9f4c68b62d1887c7c0ac9
9d10a79cfca5daf ...
SOCLMAD12AB017FC09 1 Tim McGraw Taylor Swift
81bde1c3a845c64f1677bd9d2
8f2da85dfefcf30 ...
SOTWSXL12A8C143349 1 Love Story Taylor Swift
0152fcbd02b172a874c75a57a
913f0f0109ba272 ...
SOSAXUZ12AAF3B2031 2 The Best Day Taylor Swift
0152fcbd02b172a874c75a57a
913f0f0109ba272 ...
SOSROFB12AAF3B4C5D 3 You Belong With Me Taylor Swift
8cbb5066924ec788e3fea9a4a
ae59586f46f38fa ...
SOTWSXL12A8C143349 1 Love Story Taylor Swift
ea07020bb223c733ccc55aa92
5ebcc25c4d97377 ...
SOMPTCI12AB017C416 13 Forever & Always Taylor Swift
85d0d381551960608e02df989
56277e495b3cf6b ...
SOSROFB12AAF3B4C5D 3 You Belong With Me Taylor Swift
5d5e0142e54c3bb7b69f548c2
ee55066c90700eb ...
SOLJTMU12AAF3B4C4D 2 Hey Stephen Taylor Swift
5d5e0142e54c3bb7b69f548c2
ee55066c90700eb ...
SORRBVQ12A58A7AA33 2 Change Taylor Swift
5d5e0142e54c3bb7b69f548c2
ee55066c90700eb ...
SOTNWCI12AAF3B2028 3 Forever & Always Taylor Swift
song
Tim McGraw - Taylor Swift
Love Story - Taylor Swift
The Best Day - Taylor
Swift ...
You Belong With Me -
Taylor Swift ...
Love Story - Taylor Swift
Forever & Always - Taylor
Swift ...
You Belong With Me -
Taylor Swift ...
Hey Stephen - Taylor
Swift ...
Change - Taylor Swift
Forever & Always - Taylor
Swift ...
[? rows x 6 columns]
Note: Only the head of the SFrame is printed. This SFrame is lazily evaluated.
You can use sf.materialize() to force materialization.
In [38]:
taylor_users = taylor_songs['user_id'].unique()
In [39]:
len(taylor_users)
Out[39]:
3246

Unique users who listened to Lady GaGa's songs

In [41]:
gaga_songs = song_data[song_data['artist'] == 'Lady GaGa']
In [42]:
gaga_songs
Out[42]:
user_id song_id listen_count title artist
b80344d063b5ccb3212f76538
f3d9e43d87dca9e ...
SODXRTY12AB0180F3B 1 Paper Gangsta Lady GaGa
4bd88bfb25263a75bbdd467e7
4018f4ae570e5df ...
SOXGQEM12AB0181D35 12 Speechless Lady GaGa
a5cc4c1c78e830b43bba70a8d
439ad865ca8026f ...
SOSCIZP12AB0181D2F 5 Alejandro Lady GaGa
8cbb5066924ec788e3fea9a4a
ae59586f46f38fa ...
SOASXQD12AB018902F 1 Beautiful_ Dirty_ Rich Lady GaGa
8cbb5066924ec788e3fea9a4a
ae59586f46f38fa ...
SOCBQKE12AB018548E 1 Teeth Lady GaGa
8cbb5066924ec788e3fea9a4a
ae59586f46f38fa ...
SODXRTY12AB0180F3B 1 Paper Gangsta Lady GaGa
8cbb5066924ec788e3fea9a4a
ae59586f46f38fa ...
SOEYVHS12AB0181D31 1 Monster Lady GaGa
8cbb5066924ec788e3fea9a4a
ae59586f46f38fa ...
SOJVYJH12AB0180F4F 1 Disco Heaven Lady GaGa
8cbb5066924ec788e3fea9a4a
ae59586f46f38fa ...
SOMONAP12AB0181D21 1 Again Again Lady GaGa
8cbb5066924ec788e3fea9a4a
ae59586f46f38fa ...
SOSCIZP12AB0181D2F 1 Alejandro Lady GaGa
song
Paper Gangsta - Lady GaGa
Speechless - Lady GaGa
Alejandro - Lady GaGa
Beautiful_ Dirty_ Rich -
Lady GaGa ...
Teeth - Lady GaGa
Paper Gangsta - Lady GaGa
Monster - Lady GaGa
Disco Heaven - Lady GaGa
Again Again - Lady GaGa
Alejandro - Lady GaGa
[? rows x 6 columns]
Note: Only the head of the SFrame is printed. This SFrame is lazily evaluated.
You can use sf.materialize() to force materialization.
In [43]:
gaga_users = gaga_songs['user_id'].unique()
In [44]:
len(gaga_users)
Out[44]:
2928

Taylor swift has the highest unique users listening to her songs

Total number of times songs played of each artist

In [46]:
total_count_by_artist = song_data.groupby(key_columns='artist', operations={'total_count': graphlab.aggregate.SUM('listen_count')})
In [50]:
sort_count = total_count_by_artist.sort('total_count', ascending=False)
In [51]:
sort_count
Out[51]:
artist total_count
Kings Of Leon 43218
Dwight Yoakam 40619
Björk 38889
Coldplay 35362
Florence + The Machine 33387
Justin Bieber 29715
Alliance Ethnik 26689
OneRepublic 25754
Train 25402
The Black Keys 22184
[3375 rows x 2 columns]
Note: Only the head of the SFrame is printed.
You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.
In [52]:
sort_count.tail()
Out[52]:
artist total_count
Aneta Langerova 38
Kanye West / Talib Kweli
/ Q-Tip / Common / ...
38
Jody Bernal 38
Nâdiya 36
harvey summers 31
Diplo 30
Boggle Karaoke 30
Beyoncé feat. Bun B and
Slim Thug ...
26
Reel Feelings 24
William Tabbert 14
[10 rows x 2 columns]
In [56]:
#Taking 10000 unique users
subset_test_users = test_data['user_id'].unique()[1:10000]
In [58]:
#Recommending one song for each user
recommendations = personalized_model.recommend(subset_test_users, k=1)
recommendations finished on 1000/9999 queries. users per second: 20777.9
recommendations finished on 2000/9999 queries. users per second: 19946.5
recommendations finished on 3000/9999 queries. users per second: 21371.6
recommendations finished on 4000/9999 queries. users per second: 21219.9
recommendations finished on 5000/9999 queries. users per second: 21402.2
recommendations finished on 6000/9999 queries. users per second: 20923.3
recommendations finished on 7000/9999 queries. users per second: 21091.8
recommendations finished on 8000/9999 queries. users per second: 20353.9
recommendations finished on 9000/9999 queries. users per second: 20729.9
In [59]:
recommendations
Out[59]:
user_id song score rank
c067c22072a17d33310d7223d
7b79f819e48cf42 ...
Grind With Me (Explicit
Version) - Pretty Ricky ...
0.0459424376488 1
f6c596a519698c97f1591ad89
f540d76f6a04f1a ...
Hey_ Soul Sister - Train 0.0238929539919 1
696787172dd3f5169dc94deef
97e427cee86147d ...
Senza Una Donna (Without
A Woman) - Zucchero / ...
0.017026577677 1
3a7111f4cdf3c5a85fd4053e3
cc2333562e1e0cb ...
Heartbreak Warfare - John
Mayer ...
0.0298416515191 1
532e98155cbfd1e1a474a28ed
96e59e50f7c5baf ...
Jive Talkin' (Album
Version) - Bee Gees ...
0.0118288653237 1
ee43b175ed753b2e2bce806c9
03d4661ad351a91 ...
Ricordati Di Noi -
Valerio Scanu ...
0.0305171211561 1
e372c27f6cb071518ae500589
ae02c126954c148 ...
Fall Out - The Police 0.0819672048092 1
83b1428917b47a6b130ed471b
09033820be78a8c ...
Clocks - Coldplay 0.042858839035 1
39487deef9345b1e22881245c
abf4e7c53b6cf6e ...
Black Mirror - Arcade
Fire ...
0.0417737685717 1
88325c1fc54d4227b223a7ca7
c68c2bdc39df54b ...
Sweet Baby James - James
Taylor ...
0.0619834661484 1
[9999 rows x 4 columns]
Note: Only the head of the SFrame is printed.
You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.
In [65]:
most_recommended_song = recommendations.groupby(key_columns='song', operations={'total_count': graphlab.aggregate.COUNT()})
In [66]:
most_recommended_song_sorted = most_recommended_song.sort('total_count', ascending=False)
In [67]:
most_recommended_song_sorted
Out[67]:
song total_count
Undo - Björk 435
Secrets - OneRepublic 384
Revelry - Kings Of Leon 226
You're The One - Dwight
Yoakam ...
161
Fireflies - Charttraxx
Karaoke ...
120
Sehr kosmisch - Harmonia 95
Horn Concerto No. 4 in E
flat K495: II. Romance ...
94
Hey_ Soul Sister - Train 89
OMG - Usher featuring
will.i.am ...
63
Dog Days Are Over (Radio
Edit) - Florence + The ...
48
[3144 rows x 2 columns]
Note: Only the head of the SFrame is printed.
You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.